Search CORE

1,055 research outputs found

Working Close to Home: WIRE-Net's Hire Locally Program

Author: Patricia Ma
Patricia Markovich
Tony Proscio
Publication venue: Public/Private Ventures
Publication date: 09/09/1998
Field of study

Hire Locally is an employment program that matches Cleveland's west side residents with industrial jobs employers would otherwise have searched far and wide to fill. The program is part of the nonprofit Westside Industrial Retention and Expansion Network, or WIRE-Net. This report documents the program's innovation in developing a sectoral strategy to meet labor market demands while also setting a broad agenda for community improvement. It also shares key program elements and recommendations to ensure that future programs are more effective

IssueLab

Optimal Estimation and Rank Detection for Sparse Spiked Covariance Matrices

Author: Cai Tony
Ma Zongming
Wu Yihong
Publication venue
Publication date: 01/04/2015
Field of study

This paper considers sparse spiked covariance matrix models in the high-dimensional setting and studies the minimax estimation of the covariance matrix and the principal subspace as well as the minimax rank detection. The optimal rate of convergence for estimating the spiked covariance matrix under the spectral norm is established, which requires significantly different techniques from those for estimating other structured covariance matrices such as bandable or sparse covariance matrices. We also establish the minimax rate under the spectral norm for estimating the principal subspace, the primary object of interest in principal component analysis. In addition, the optimal rate for the rank detection boundary is obtained. This result also resolves the gap in a recent paper by Berthet and Rigollet [1] where the special case of rank one is considered

arXiv.org e-Print Archive

PubMed Central

ScholarlyCommons@Penn

Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow

Author: Cai T. Tony
Li Xiaodong
Ma Zongming
Publication venue
Publication date: 10/06/2015
Field of study

This paper considers the noisy sparse phase retrieval problem: recovering a sparse signal

x \in \mathbb{R}^p

from noisy quadratic measurements

y_j = (a_j' x )^2 + \epsilon_j

j=1, \ldots, m

, with independent sub-exponential noise

\epsilon_j

. The goals are to understand the effect of the sparsity of

x

on the estimation precision and to construct a computationally feasible estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12] proposed for noiseless and non-sparse phase retrieval, a novel thresholded gradient descent algorithm is proposed and it is shown to adaptively achieve the minimax optimal rates of convergence over a wide range of sparsity levels when the

a_j

's are independent standard Gaussian random vectors, provided that the sample size is sufficiently large compared to the sparsity of

x

.Comment: 28 pages, 4 figure

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Sparse PCA: Optimal rates and adaptive estimation

Author: Cai T. Tony
Ma Zongming
Wu Yihong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2013
Field of study

Principal component analysis (PCA) is one of the most commonly used statistical procedures with a wide range of applications. This paper considers both minimax and adaptive estimation of the principal subspace in the high dimensional setting. Under mild technical conditions, we first establish the optimal rates of convergence for estimating the principal subspace which are sharp with respect to all the parameters, thus providing a complete characterization of the difficulty of the estimation problem in term of the convergence rate. The lower bound is obtained by calculating the local metric entropy and an application of Fano's lemma. The rate optimal estimator is constructed using aggregation, which, however, might not be computationally feasible. We then introduce an adaptive procedure for estimating the principal subspace which is fully data driven and can be computed efficiently. It is shown that the estimator attains the optimal rates of convergence simultaneously over a large collection of the parameter spaces. A key idea in our construction is a reduction scheme which reduces the sparse PCA problem to a high-dimensional multivariate regression problem. This method is potentially also useful for other related problems.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1178 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Optimal Hypothesis Testing for High Dimensional Covariance Matrices

Author: Cai Tony
Ma Zongming
Publication venue: ScholarlyCommons
Publication date: 01/12/2013
Field of study

This paper considers testing a covariance matrix Σ in the high dimensional setting where the dimension p can be comparable or much larger than the sample size n. The problem of testing the hypothesis H0:Σ=Σ0 for a given covariance matrix Σ0 is studied from a minimax point of view. We first characterize the boundary that separates the testable region from the non-testable region by the Frobenius norm when the ratio between the dimension p over the sample size n is bounded. A test based on a U-statistic is introduced and is shown to be rate optimal over this asymptotic regime. Furthermore, it is shown that the power of this test uniformly dominates that of the corrected likelihood ratio test (CLRT) over the entire asymptotic regime under which the CLRT is applicable. The power of the U-statistic based test is also analyzed when p/n is unbounded

arXiv.org e-Print Archive

ScholarlyCommons@Penn

Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data

Author: Cai T. Tony
Ma Rong
Publication venue
Publication date: 31/10/2022
Field of study

This paper investigates the theoretical foundations of the t-distributed stochastic neighbor embedding (t-SNE) algorithm, a popular nonlinear dimension reduction and data visualization method. A novel theoretical framework for the analysis of t-SNE based on the gradient descent approach is presented. For the early exaggeration stage of t-SNE, we show its asymptotic equivalence to power iterations based on the underlying graph Laplacian, characterize its limiting behavior, and uncover its deep connection to Laplacian spectral clustering, and fundamental principles including early stopping as implicit regularization. The results explain the intrinsic mechanism and the empirical benefits of such a computational strategy. For the embedding stage of t-SNE, we characterize the kinematics of the low-dimensional map throughout the iterations, and identify an amplification phase, featuring the intercluster repulsion and the expansive behavior of the low-dimensional map, and a stabilization phase. The general theory explains the fast convergence rate and the exceptional empirical performance of t-SNE for visualizing clustered data, brings forth interpretations of the t-SNE visualizations, and provides theoretical guidance for applying t-SNE and selecting its tuning parameters in various applications.Comment: Accepted by Journal of Machine Learning Researc

arXiv.org e-Print Archive